Estimating Activity based on Mobility Data#

Less movement typically means less economic activity. Understanding where and when population movement occurs can help inform disaster response and public policy, especially during crises.

Similarly to COVID-19 Community Mobility Reports, Facebook Population During Crisis and Mapbox Movement Data, we generate a series of crisis-relevant metrics, including the sampled baseline population, percent change in population and z-score. The metrics are calculated by counting devices drawn out from a mobility data panel in each tile and at each time period and comparing to a baseline period.

The following map shows the z-score on each tile for each time period. The z-score the number of standard deviations that the data point diverges from the mean; in other words, whether the change in population for that area is statistically different from the baseline period.

Data#

Area of Interest#

In this step, we import the clipping boundary and the H3 tessellation defined by area(s) of interest below.

AOI = geopandas.read_file("../../data/interim/tessellation/SYRTUR_tessellation.gpkg")
Make this Notebook Trusted to load map: File -> Trust Notebook

Mobility Data#

Through the Development Data Partnership, the project team obtained a longitudinal panel of mobility data from which the metrics are calculated, including the percent change, z-score and the proposed activity index. The metrics are calculated by aggregating the number of devices within the area of interest in each tile and at each time period. For additional information, please see Data and Methodology.

Note

Due to the data volume and velocity (updated daily), the computation took place on AWS on an EC2 instance owned by the project team. The resulting aggregation is the tabulation of the device count for each hex_id and date.

ACTIVITY = pd.read_csv(
    "../../data/interim/SYRTUR_activity_index.csv", parse_dates=["date"]
)

Additionally, we create a column weekday that will come handy later on.

ACTIVITY["weekday"] = ACTIVITY["date"].dt.weekday

Methodology#

The methodology presented consists of generating a series of crisis-relevant metrics, including the sampled baseline population, percent change in population and z-score based on the number of devices in an area at a time. The device count is determined for each tile and for each time period, as defined by data standards and the spatial and temporal aggregations below. Similar approaches have been adopted, such as in []. The metrics may reveal movement trends in the population that may indicate more or less activity.

Data Standards#

Population Sample#

The population sample is composed of GPS-enabled devices drawn out from a longituginal mobility data panel. It is important to emphasize th population sample is obtained via convenience sampling and that the mobility data panel represents a subset of the total population in an area at a time, specifically only users that turned on location tracking on their mobile device. Thus, derived metrics do not represent the total population density.

Spatial Aggregation#

The metrics are spatially aggregated on H3 tiles resolution 6. This is equivalent to approximately to an area of \(36 Km^2\) on average

Make this Notebook Trusted to load map: File -> Trust Notebook

Illustration of H3 tile resolution 6 near Gaziantep, Türkiye. Gaziantep is among the most affected areas by the 2023 Türkiye–Syria Earthquake; a 2200-year-old Gaziantep Castle was destroyed after two massive earthquakes that hit Türkiye.

Temporal Aggregation#

The metrics are temporally aggregated daily in Coordinated Universal Time (UTC).

Implementation#

Calculate BASELINE#

For this experiment, we choose the 4-week period spanning January 2, 2023 to January 29, 2023 as the baseline. The baseline is calculated for each tile and for each time period, according to the spatial and temporal aggregations.

BASELINE = ACTIVITY[ACTIVITY["date"].between("2023-01-02", "2023-01-29")]

In fact, the result 7 different baselines for each tile. We calculate the mean device count for each tile and for each day of the day.

MEAN = BASELINE.groupby(["hex_id", "weekday"]).agg({"count": ["mean", "std"]})

Taking a sneak peek,

MEAN[MEAN.index.get_level_values("hex_id").isin(["862da898fffffff"])]
count.mean count.std
hex_id weekday
862da898fffffff 0 5819.75 2285.557901
1 6675.25 1918.023527
2 7020.00 2137.928281
3 6586.00 2345.257484
4 5671.50 2838.529490
5 6300.00 2516.413718
6 6891.75 2462.698029

Calculate Z-Score#

A z-score is a statistical measure that tells how above or below a particular data point is from the mean (average) of a group of data points, in terms of standard deviations. It is used to standardize data and make meaningful comparisons between different sets of data. A z-score is particularly useful when working with normally distributed data. By examining the z-scores, you can assess how closely a data set follows a normal distribution. Percent change does not provide this information.

Creating StandardScaler for each hex_id,

scalers = {}

for hex_id in BASELINE["hex_id"].unique():
    scaler = StandardScaler()
    scaler.fit(BASELINE[BASELINE["hex_id"] == hex_id][["count"]])

    scalers[hex_id] = scaler

Joining with AOI,

ACTIVITY = ACTIVITY.merge(AOI, how="left", on="hex_id").drop(["geometry"], axis=1)

Joining with BASELINE,

ACTIVITY = pd.merge(ACTIVITY, MEAN, on=["hex_id", "weekday"], how="left")

Preparing columns,

ACTIVITY["n_baseline"] = ACTIVITY["count.mean"]
ACTIVITY["n_difference"] = ACTIVITY["count"] - ACTIVITY["n_baseline"]

Additionally, we calculate the percent change. While the z-score offers more robustness to outliers and numerical stability, the percent change can be used when interpretability is most important.

ACTIVITY["percent_change"] = 100 * (ACTIVITY["count"] / (ACTIVITY["n_baseline"]) - 1)

Calculating z_score,

for hex_id, scaler in scalers.items():
    try:
        predicate = ACTIVITY["hex_id"] == hex_id
        score = scaler.transform(ACTIVITY[predicate][["count"]])
        ACTIVITY.loc[predicate, "z_score"] = score
    except:
        pass

Taking a sneak peek,

hex_id date count n_baseline n_difference percent_change z_score ADM0_PCODE ADM1_PCODE ADM2_PCODE
45358 862da898fffffff 2023-03-01 7400 7020.0 380.0 5.413105 0.462691 TR TUR027 TUR027008
39511 862da898fffffff 2023-02-15 5740 7020.0 -1280.0 -18.233618 -0.323831 TR TUR027 TUR027008
34538 862da898fffffff 2023-02-08 4689 7020.0 -2331.0 -33.205128 -0.821804 TR TUR027 TUR027008
24324 862da898fffffff 2023-01-25 5905 7020.0 -1115.0 -15.883191 -0.245653 TR TUR027 TUR027008
24331 862da898fffffff 2023-02-01 5799 7020.0 -1221.0 -17.393162 -0.295877 TR TUR027 TUR027008
... ... ... ... ... ... ... ... ... ... ...
58792 862dae96fffffff 2023-03-04 1 NaN NaN NaN -0.707107 TR TUR031 TUR031010
58793 862dae96fffffff 2023-03-10 2 NaN NaN NaN 1.414214 TR TUR031 TUR031010
58794 862dae96fffffff 2023-03-11 2 NaN NaN NaN 1.414214 TR TUR031 TUR031010
58797 862dae977ffffff 2023-03-04 1 NaN NaN NaN -0.500000 TR TUR031 TUR031011
58798 862dae977ffffff 2023-03-11 1 NaN NaN NaN -0.500000 TR TUR031 TUR031011

58802 rows × 10 columns

Results#

Limitations#

The methodology presented is an investigative pilot aiming to shed light on the economic situation in Syria and Türkiye leveraging alternative data, when confront with the absence of traditional data and methods.

Caution

In summary, beyond standing-by peer-review, the limitations can be summarized in the following.

  • The methodology relies on private intent data in the form of mobilily data. In other words, the input data was not produced or collected to analyze the population of interest or address the research question as its primary goal but repurposed for the public good. The benefits and caveats when using private intent data have been extensively discussed in the World Development Report 2021 [].

  • On the one hand, the mobility data panel is spatially and temporally granular and readily available, on the other hand it is created as a convenience sampling which constitutes an important source of bias. The panel composition is not entirely known and susceptible to change, the data collection and the composition of the mobility data panel cannot be controlled.

  • In summary, the results cannot be interpreted to generalize the entirety of population movement but can potentially provide information on movement panels to inform Syrian economic situation, considering time constraints and the scarcity of traditional data sources in the context of Syria.

References#